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WO 00/58507 PCT/GB00/01222 

POLYNUCLEOTIDE SEQUENCING 

Field of the Invention 

This Invention relates to the sequencing of polynucleotides. In particular, this 
invention discloses methods for determining the sequence of arrayed polynucleotides. 
5 Background to the Invention 

Advances in the study of molecules have been led, in part, by improvement in 
technologies used to characterise the molecules or their biological reactions. In 
particular, the study of the nucleic adds DNA and RNA has benefited from developing 
technologies used for sequence analysis and the study of hybridisation events. 
10 An example of the technologies that have improved the study of nucleic acids, 

is the development of fabricated arrays of immobilised nucleic acids. These arrays 
consist typically of a high-density matrix of polynucleotides immobilised onto a solid 
support material. Fodor et a/, Trends in Biotechnology (1994) 12:19-26, describes 
ways of assembling the nucleic adds using a chemically sensitized glass surface 
15 protected by a mask, but exposed at defined areas to allow attachment of suitably 
modified nudeotide phosphoramidites. Fabricated arrays may also be manufactured 
by the technique of "spotting" known polynucleotides onto a solid support at 
predetermined positions (e.g. Stimpson et a/ PNAS (1995) 92:6379-6383). 

A further development in array technology is the attachment of the 
20 polynudeotides to the solid support material via beads (microspheres). 

For DNA arrays to be useful their sequences must be determined. US 
5302509 disdoses a method to sequence polynucleotides immobilised on a solid 
support The method relies on the incorporation of 3-blocked bases A, G, C and T 
having a different fluorescent label to the immobilised polynudeotide, in the presence 
25 of DNA polymerase. The polymerase incorporates a base complementary to the 
target polynudeotide, but is prevented from further addition by the 3-blocking group. 
The label of the incorporated base can then be determined and the blocking group 
removed to allow further polymerisation to occur. 

However, the need to remove the blocking groups after each cyde is time- 
30 consuming and must be performed with high effidency. 

Similarly, EP0640146 disdoses a polymerisation-based technique for 
sequendng DNA. The technique again requires removal of a blocking group prior to 
subsequent incorporation of nudeotides. 
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There is therefore a need for alternative methods for determining the sequence 
of arrayed polynucleotides. 
Summary of the Invention 

In the general method of the invention, a target polynucleotide sequence can 
5 be determined by generating its complement using the polymerase reaction by the 
extension of a suitable primer, and characterising the successive incorporation of 
bases that generate the complement The method requires the target sequence to be 
immobilised on a solid support, with multiple copies of the target being localised within 
discrete regions. Each of the different bases A, T, G or C are then brought, by 
10 sequential addition, into contact with the target, and any incorporation events 
detected. Repeating the procedure with each of the bases allows the sequence of the 
complement to be identified, and thereby the target sequence also. 

A distinguishing feature from the disclosure in US 5302509 is that the bases 
do not contain a blocking group preventing further polymerisation from occurring. In 
1 5 addition, the present invention requires the separate and serial addition of each of the 
different base types to the array, and, when fluorophores are used as the label, 
removal of the label can be carried out efficiently by photobleaching. 

A further distinguishing feature, particularly relevant to EP 0640146, is that for 
each incorporation step, only a minor proportion of the bases are detectably-labelled. 
2 0 Consequently, among the many copies of the target, relatively few will incorporate a 
labelled base into the complement This permits the straight forward identification of 
any sequence containing two or more consecutive bases of the same type. In this 
case, copies of the target will incorporate differing amounts of the labelled base into 
the complement, resulting in differing levels of signal. It is then possible to determine 
25 quantitatively the number of consecutive bases on the complement by detecting the 
different level of signals generated, as explained later. 

Accordingly, a method for determining the sequence of a target polynucleotide 
on an array, comprises the steps of: 

(i) forming an array comprising multiple copies of each target 
30 polynucleotide; 

(ii) contacting the array with a composition comprising one of the bases 
A, T, G or C under conditions that permit polymerisation to occur, 
wherein a minor proportion of the bases are detectably-labelled; 
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(iii) detecting the incorporation of a base onto the complement of the 
target after removal of non-incorporated bases; and 

(iv) repeating steps (ii) and (iii) with each of the different bases until the 
sequence is determined. 

5 According to one aspect of the invention, the label from incorporated bases 

may be removed either prior to the addition of bases having the same label or before 

it becomes difficult to detect incorporation. 

According to a second aspect of the invention, when the label is a fluorophore, 

the fluorescence signal generated on nucleotide incorporation may be measured 
10 quantitatively, without the need to remove labels after each incorporation step. There 

is therefore a method for determining the sequence of a target polynucleotide as 

described above, wherein the fluorescence labels are not removed from the 

incorporated nucleotides, and subsequent detection of incorporation is carried out by 

measuring the stepwise increase in the fluorescence signal. 
15 The advantage of this embodiment is that it does not require the step of 

photobleaching and may therefore be carried out quickly and efficiently. 

Sequencing the polynucleotides on the array makes it possible to form a 

spatially addressable array. This may then be used for many different applications, 

including genotyping studies and other characterisation experiments. 
20 The method of the present invention may be automated to produce a very 

efficient and fast sequence determination. 

Description of the Drawings 

Figure 1 represents a fluorescence (left) or optical (right) image generated in 

the presence (A) and absence (B) of polymerase enzyme; and 
25 Figure 2 represents a fluorescence image generated from beads with 

fluorophore-labelled DNA attached (A) or a fluorophore-labelled nucleotide 

incorporated into DNA using a polymerase (B). 

Description of the Invention 

The method for determining the sequence of the arrayed polynucleotides is 
30 carried out by contacting the array separately with the different bases to form the 

complement to that of the target polynucleotide, and detecting incorporation. The 

method makes use of polymerisation, whereby a polymerase enzyme extends the 

complementary strand by incorporating the correct base complementary to that on the 
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target The polymerisation reaction also requires a specific primer to initiate 
polymerisation. 

For each cycle, adding one base type to the array, only a minor proportion of 
the bases are detectably-labelled, i.e. less than 50% of the bases are detectably- 
5 labelled, preferably less than 20%. Therefore, it is only the incorporation of 
detectably-labelled bases that can be monitored. The labelled bases are present at 
a fixed low concentration with respect to the non-labelled bases. The concentration 
may be chosen to permit a suitable incorporation rate of the labelled bases for efficient 
detection. For example the concentration may be chosen to permit between 10% to 
10 0.0001% incorporation of labelled bases, preferably, between 5% and 0.01%, most 
preferably between 1% and 0.1%. 

Using many copies of the same polynucleotide in discrete regions it is possible 
to detect quantitatively the incorporation of a labelled base. For example, on 
incorporation of the adenosine nucleotide, a proportion of the polynucleotides will have 
15 a non-labelled adenosine nucleotide and a proportion will have a labelled adenosine 
nucleotide. Detecting the incorporation of the label will allow a sequence 
determination to be made. If two adenosine nucleotides are incorporated 
consecutively into the complementary strand, a proportion of the polynucleotide copies 
will incorporate two non-labelled adenosine nucleotides, a proportion will incorporate 
20 one labelled adenosine and one non-labelled adenosine, and a proportion will 
incorporate two labelled adenosine nucleotides. However, the ratio of labelled to 
unlabelled nucleotide will be such that very little of the labelled nucleotide will 
incorporate into the same strand. This is especially preferable when fluorescent labels 
are used, where fluorescence quenching or loss of linearity of signal may be caused. 
2 5 The label will therefore be distributed throughout the population of a given sequence. 
Consequently, there will be a quantitative difference in the signal generated within the 
population of the given sequence. It is possible therefore to detect the incorporation 
of the two consecutive labelled bases due to the quantitative differences in the signal. 
In the context of the invention, reference to the bases A, T, G and C is taken 
30 to be a reference to the deoxynucleoside triphosphates, Adenosine, Thymidine, 
Guanosine and Cytidine, and to functional analogs thereof, including 
dideoxynudeoside triphosphates. 

The terms "arrayed polynucleotides" and "polynucleotide arrays" are used 
herein to define an array of polynucleotides that are immobilised on a solid support 
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material. The polynucleotides may be immobilised to the solid support indirectly 
through a linker molecule, or may be attached to a particle, e.g. a microsphere, which 
is itself attached to a solid support material. 

An important requirement is that there are multiple copies of each target 
5 polynucleotide on the array. Typically, these will be in discrete positioned regions on 
the solid support Each discrete region may typically comprise several hundred to 
several thousand copies of the target polynucleotide. There may be, for example, up 
to 10,000 polynucleotide copies per region. The polynucleotides within each region 
preferably form a substantially uniform arrangement This permits a high level of 
10 discrimination between individual polynucleotides, which may be preferable to resolve 
individual labels. However, it is not necessarily the density of the polynucleotides that 
is of primary importance; the concentration of the labelled bases during the 
sequencing steps is also important, and this can be optimised readily by the skilled 
person. 

15 The term "spatially addressable" is used herein to describe how different 

molecules may be identified on the basis of their position on an array. 

The detection of an incorporated base may be carried out by using a confocal 
scanning microscope to scan the surface of the array with a laser, to image a 
fluorophore bound directly to the incorporated base. Alternatively, a sensitive 2-D 

20 detector, such as a charge-coupled detector (CCD), can be used to visualise the 
individual signals generated. The use of such apparatus is known to the skilled 
person. However, other techniques such as scanning near-field optical microscopy 
(SNOM) are available which are capable of smaller optical resolution, thereby 
committing "more dense" arrays to be used. For example, using SNOM, individual 

2 5 polynucleotides may be distinguished when separated by a distance of less than 1 00 
nm, e.g. 10 nm x 10 nm. For a description of scanning near-field optical microscopy, 
see Moyer et a/ Laser Focus World (1993) 29:10. 

The polynucleotides that may be sequenced include DNA, RNA and synthetic 
alternatives such as PNA. 

30 The polynucleotides may be attached to the solid support by recognised 

means, including the use of biotin-avidin interactions or the use of amine linkages. In 
one embodiment, the polynucleotides are attached to the solid support via microscopic 
beads (microspheres), which may in turn be attached to the solid support by known 
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means. The microspheres may be of any suitable size, typically in the range of from 
1 0 nm to 1 00 nm in diameter. 

Attachment via microspheres is a preferred embodiment as it allows discrete 
regions of polynucleotides to be easily generated on the array. Each microsphere 
5 may have multiple copies of a polynucleotide attached, and each microsphere can be 
resolved individually to determine incorporation events. 

The method makes use of the polymerisation reaction to generate the 
complementary sequence of the target The conditions necessary for polymerisation 
to occur will be apparent to the skilled person. For example, a polymerase enzyme 

10 may be used to extend the complementary strand, and different polymerases, 
including DNA polymerases and RNA polymerases, are known to those skilled in the 
art For example, the Klenow fragment of E. coli DNA polymerase I or the T7 DNA 
polymerase may be used. To cany out the polymerase reaction it may be necessary 
to first anneal a primer sequence to the target polynucleotide, the primer sequence 

15 being recognised by the polymerase enzyme and acting as an initiation site for the 
subsequent extension of the complementary strand. Other conditions necessary for 
canying out the reaction, including temperature and pH, will be apparent to those 
skilled in the art 

This polymerisation step is allowed to proceed for a time sufficient to allow 
20 incorporation of all the correct bases. This will depend on the efficiency of 
incorporation and can be determined by the skilled person. Bases that are not 
incorporated are then removed, for example, by subjecting the array to a washing 
step, and detection of the incorporated labels may then be carried out 

Detection may be by conventional means, for example if the label is a 

2 5 fluorescent moiety, detection may be carried out by optical microscopy, e.g. confocal 

scanning microscopy. 

A preferred embodiment of the invention uses fluorophores as the label, and 
many examples of fluorophores that may be used are known in the prior art e.g. 
tetramethylrhodamine (TMR). 

3 o After detection, the labels may be removed from the bases so that they do not 

interfere with the signal generated from next cycle of incorporation. If the label is a 
fluorophore it is possible to bleach the fluorophore by chemical means or through the 
use of a laser (photobleaching). Alternatively, the label may be removed by chemical 
or photochemical means. 
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The process of incorporating bases may then be repeated using each of the 
different bases until the sequence has been determined. 

It may not always be necessary to remove the labels prior to the addition of the 
next base sample. Different bases may have distinguishable labels and so it will only 
5 be necessary to remove incorporated labels prior to adding bases having an identical 
label 

In one embodiment, fluorescent labels are used and detection is carried out by 
optical means without the requirement for removing labels between incorporation 
steps. 

10 For example, a confocal microscope may be used to scan the array and 

measure quantitatively the step-wise increase in fluorescence after each cycle of 
incorporation. By measuring the increase in the amount of fluoresence after each 
cycle, and not the absolute amount, it should be possible to determine whether there 
are two or more nucleotides incorporated consecutively onto the template. This 

15 method relies on using sensitive detectors (e.g. charge coupled detectors) to measure 
the increase in signal. Suitable apparatus for carrying out the method is available 
commercially and will be apparent to the skilled person. 

In a separate embodiment of the invention, the labelled bases may be modified 
so that on incorporation, no further bases may be added. Bases that carry out this 

20 chain terminating function include the dideoxynudeoside triphosphates, as used in 
conventional Sanger sequencing (Proa Natl. Acad. ScL USA 74: 5463-5467, 1977). 

Therefore, after each incorporation step, a proportion of the polynucleotides 
will incorporate a labelled base that prevents further chain-extension. The number of 
polynucleotides available for the polymerisation step will gradually decrease as the 

25 sequencing method proceeds. However, provided there are sufficient copies of the 
polynucleotide, and provided the concentration of the labelled, chain-terminating 
bases, is sufficiently low, it should be possible to sequence the target polynucleotides. 

The following experiment illustrates the invention. 
Example 

30 In this experiment a fluorescently-labelled DNA molecule (SEQ ID NO. 1 ) was 

coupled directly to beads and the level of fluorescence measured using an inverted 
Nikon microscope with an ICCD detector in an epifluorescence set-up. In a separate 
reaction, an unlabelled DNA (SEQ ID NO. 2) was attached to beads (containing SEQ 
ID NO. 2) and a fluorescently-labelled nucleotide incorporated onto the DNA (SEQ ID 
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NO. 2) using a polymerase. By comparing the average level of fluorescence between 
the two sets of beads the efficiency of incorporation of the fluorescently-labelled 
nucleotide was shown to be 89%. This was determined by diluting the fluorescent 
beads in unmodified beads so that each fluorescent bead could be detected 
5 individually. 

By measuring the signal-to-noise in the experiment an estimate can be 
obtained of the fraction of nucleotides that can be labelled with a fluorophore and 
detected when incorporated. This Is less than 1%, i.e. it is possible to detect 
incorporation when the concentration of fluorescently-labelled nucleotides Is such that 
10 only 1% Is incorporated, and the remaining 99% of the incorporated nucleotides are 

non-labelled. 

The experiment is now described in more detail. 

DNA Coupling 

Carboxylic acid-modified beads (both non-porous polystyrene and silica) of 
15 sizes 0.5-2.9 pm were placed in solutions of milli-q water (typically 1 mg per 50 ml). 
1-3(3-Dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride (EDC) (1 mg) and the 
oligonucleotide (added to give a final concentration of 10 um) were added, the beads 
agitated by vortexing and left for 12 h at room temperature. The beads were washed 
with 0.15 M NaOH, twice with TT buffer (250 mM Tris.HCI. pH 8.0, 0.1 % tween 20) 
20 and heated at 80*C in TTE (250 mM Tris.CHI, pH 8.0, 0.1% tween 20, 20 mM sodium 
EDTA) and rinsed with water. To achieve a dilute array, the beads were sonicated in 
200 pi water and 2.5 pi evaporated onto a heated slide. 
Enzyme Incorporation 

A solution of the 51mer (SEQ ID NO. 3) (4 pm; 2eqvs) in hybridisation buffer 
25 (5 mM, MgCI* 7.5 mM DTT, 10 mM Tris.HCI (pH 7.6), 0.005% Triton X100) (20 ml) 
was added to 0.05 mg of beads (containing SEQ ID NO. 2) which were heated to 
90°C for 2 min and allowed to cool for 1h. The fluorescent dUTP (400 pm stock. 0.5 
pi. 10 pm; 4eqvs)) was added. A fraction of the beads were removed as a washing 
control and the polymerase (Sequence) (0.5 pi. 6.5 units) (one unit will incorporate 1 
3 o nmole dNTP in 30 s a 37°C) was added. The reaction was left at room temperature 
for 4 h and the beads were washed with NaOH, TT and TTE buffers as above and 
arrayed onto a coverslip. 



The oligos used in this study are as follows: 
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S'-CfrAMRAJAGCGTCGGCAGGTATCCCAA^Ceamino)-^ SEQ ID NO. 1 



5 



and unlabelled: 

5 , -amin<>-GTCATCGAACGTCGAGCCTCGCAGCCGTCCAACCAACTCA-3 , 

SEQ ID NO. 2 



10 and 

3^CAGTAGCTTGCAGCTCGGAGCGTCGGCAGGTTGGTTGAG 

SEQ ID NO. 3 

15 

as hybridised template. 

Figure 1 shows the fluorescence image on the left and the optical image on the 
right when the experiment on the incorporation of fluorescently-labelled d-UTP was 
performed in the presence (A) and absence (B) of the polymerase. It is dear that no 

2 0 fluorescence is detected in the absence of any enzyme. 

Figure 2 shows the beads diluted in unmodified beads so a quantitative 
analysis can be performed. The top figures (A) show the fluorescence from the beads 
with fluorophore-labelled DNA attached to the bead and the lower image (B) shows 
the level of fluorescence when the fluorophore-labelled nucleotide is incorporated into 
25 the DNA using a polymerase. The values of the fluorescence from the beads were 
compared: 

(A) 3-TAMRA DNA; Average counts/bead = 1956 (54 beads, +/- 50%) 

3 o (B) unblocked carboxylic acid-modified beads; counts/bead = 1739 (89%) 

(88beads, = +/-40%) 

This means the incorporation of the labelled nucleotide is 89%. 

By comparing the signal-to-noise in Figure 1 between the level of fluorescence 
35 when the enzyme is present and when it is absent it is possible to estimate that the 
level of fluorescence could be reduced by a factor of 1 00-1 000 while still allowing the 
detection of fluorescence above the background with adequate signal-to-noise. This 
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means that experiments can be performed with the f luorophore-labelled nucleotides 
highly diluted in non-labelled nucleotides so that only 1% of the fluorophore-labelled 
nucleotides are incorporated. 
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CLAIMS: 

1 . A method for determining the sequence of a target polynucleotide, comprising 
the steps of: 

(i) contacting an array comprising multiple copies of the target 
5 polynucleotide with one of the bases A, T, G or C in a form and under conditions that 

permit polymerisation to occur, wherein a minor proportion of the base is detectably- 
labelled; 

(ii) detecting the incorporation of the base onto the complement of the 
target after removal of non-incorporated base; and 

10 (iiO repeating steps (i) and (ii) with each of the different bases until the 

sequence is determined. 

2. A method according to claim 1, wherein the label is a fluorescent moiety. 

3. A method according to claim 2, wherein step (ii) is carried out using optical 
means. 

15 4. A method according to claim 3, wherein the optical means is a confocal 
scanning microscope. 

5. A method according to any of claims 2 to 4, wherein step (ii) is carried out by 
measuring quantitatively the fluorescence signal generated on nucleotide 
incorporation. 

20 6. A method according to any preceding claim, wherein the label is not removed 
from the incorporated nucleotides, and subsequent detection of incorporation is 
carried out by measuring the stepwise increase in the signal. 

7. A method according to any of claims 1 to 5, wherein the label from 
incorporated bases is removed either prior to the addition of base having the same 

2 5 label or before it becomes difficult to detect incorporation. 

8. A method according to claim 7, wherein the label is removed by 
photobleaching. 

9. A method according to any preceding claim, wherein the proportion is less than 
10%. 

30 10. A method according to claim 9, wherein the proportion is less than 1%. 

11. A method according to any preceding claim, wherein the polynucleotide is 
attached to the array via microspheres. 

12. A method according to any preceding claim, wherein the detectably-labelled 
bases are dideoxynucleoside triphosphates. 
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